A Data Driven Approach to Audiovisual Speech Mapping

نویسندگان

  • Andrew Abel
  • Ricard Marxer
  • Jon Barker
  • Roger Watt
  • Bill Whitmer
  • Peter Derleth
  • Amir Hussain
چکیده

The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Viseme Clustering for Audiovisual Speech Synthesis

A common approach in visual speech synthesis is the use of visemes as atomic units of speech. In this paper, phonemebased and viseme-based audiovisual speech synthesis techniques are compared in order to explore the balancing between data availability and an improved audiovisual coherence for synthesis optimization. A technique for automatic viseme clustering is described and it is compared to ...

متن کامل

The Cortical Representation of the Speech Envelope 1 is Earlier for Audiovisual Speech than Audio Speech 2 3 Running Head : Earlier Representation of Continuous Audiovisual

36 Visual speech can greatly enhance a listener's comprehension of auditory speech when they 37 are presented simultaneously. Efforts to determine the neural underpinnings of this 38 phenomenon have been hampered by the limited temporal resolution of hemodynamic 39 imaging and the fact that electro-and magnetoencephalographic (EEG/MEG) data are usually 40 analyzed in response to simple, discret...

متن کامل

The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research ha...

متن کامل

A clustering approach for mineral potential mapping: A deposit-scale porphyry copper exploration targeting

This work describes a knowledge-guided clustering approach for mineral potential mapping (MPM), by which the optimum number of clusters is derived form a knowledge-driven methodology through a concentration-area (C-A) multifractal analysis. To implement the proposed approach, a case study at the North Narbaghi region in the Saveh, Markazi province of Iran, was investigated to discover porphyry ...

متن کامل

Face Synthesis Driven by Audio Speech Input Based on Hmms

In this paper, a HMM-based visual speech system driven by audio speech input is designed to render a face model while synchronous audio is played. Compared to many methods adopted by other researchers, there is much difference between our approach and theirs. We first train the models for every final and initial in mandarin. In this process, a large quantity of audio training data under differe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016